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Determinants of School Enrollment and School Performance* 

John Conllsk 

-•■ Institute for Research on Poverty 
University of Wisconsin 

Introduction 

This paper analyzes 1960 Census data on school enrollment and school perfor- 
mance of age 5 to 19 children in the United States. School enrollment here refers 
to whether a child is or is not enrolled In school; and school performance refers 
to whether an enrolled child Is behind, with, or ahead of his age group in years 
of schooling completed. Demographic variables describing age, color, sex, rural- 
urban status, education of parents, and income of parents are used to explain 
variation in school enrollment and performance across the school age population. 
Since these explanatory variables are almost completely outside the control of 
the children themselves, their explanatory power measures the lack of equal 
educational opportunity the; children face. Siuce the explanatory variables are 
also largely outside the short-run control of would-be policy-makers, their 
explanatory power also indicates to some extent the difficulty of educational 
policy to improve school enrollment and performance. Nonetheless, the importance 
of the parental income variable is somewhat encouraging evidence for income- 
supplementing policies for the poor. If supplements to poor parents' incomes 

*The research reported here was supported by funds granted to the Institute 
for Research on Poverty at the University of Wisconsin by the Office of 
Economic Opportunity pursuant to the provisions of the Economic Opportunity 
Act of 1964. The conclusions are the sole responsibility of the author. 

Thanks are due to Harold Watts for helpful comments on an earlier draft 

of the paper. U.S. DEPARTMENT OF HEALTH, EDUCATION & WELFARE 
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tend to improve their children's school enrollment and performance, then the 
supplements will tend to have desirable second generation effects on poverty. 

The school enrollment variable used in the theoretical model below is 
defined as the probability that a child of given characteristics will be 
enrolled in school. The school performance variable used is based on two 
probabilities— (i) the probability that an enrolled child has skipped ahead 
of his age group in years of school completed, and (ii) the probability that 
he has flunked or otherwise fallen behind his age group in years completed. 

For simplicity, the two probabilities will not be treated separately; rather 
the model wi.l deal with their difference (the first minus the second). Thus, 
the school performance variable will be a “net skip probability" (or the nega- 
tive of a "net flunk probability," which is perhaps more descriptive, since 
flunks far outnumber skips). A simple two equation model will be specified 
which makes the enrollment variable and the school performance variable depend 
on the demographic explanatory variables (age, sex, etc.); and the model will 
be fit via ordinary regression analysis to the 1960 Census data. 

Table 1 presents the cut-off points used by the Census in deciding when a 
child is behind or ahead of his age group in years of schooling completed. It 
should be stressed that the school performance variable to be used is only a 
very rough indicator, for at least two reasons. First, the standard of perfor- 
mance for a given child in determining skipping and flunking is the average 
ability of his classmates; and this varies greatly and systematically from 
school to school. Second, neither skipping nor flunking is an automatic conse- 
quence of a superior or inferior performance by a child. Nonetheless, any 
school performance data collected on the complete scale of the U. S. Census 
seems to deserve at least as much attention as it is given here. 
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Cut-off Points in Defining Relative Progress Rate 
Year in which Enrolled* * 



Aae 

7 


Behind Aee Group 
none 


WLCn hKC vjiuuu 

1 and 2 


3 or more 


8 


1 or less 


2 and 3 


4 or more 


9 


2 or less 


3 and 4 


5 or more 


10 


3 or less 


4 and 5 


6 or more 


11 


4 or less 


5 and 6 


7 or more 


12 


5 or less 


6 and 7 


8 or more 


13 


6 or less 


7 and 8 


9 or more 


14 


7 or less 


8 and 9 


10 or more 


15 


8 or less 


9 and 10 


11 or more 


16 


9 or less 


10 and 11 


12 or more 


17 


10 or less 


11 and 12 


13 or more 


18 


11 or less 


12 and 13 


14 or more 


19 


12 or less 


13 and 14 


15 or more 



Source; Page IX of [4]. *The numbers 1 to 8 refer to 
grade school, 9 to 12 to the four years of high school, 

college. 



the eight years of 
and 13 and up to 



1. The Model 









Let r t be the probability that a child of age t Is enrolled In school; and, 
given that he Is enrolled, let p t be the difference of (1) the probability that 
he Is ahead of his age group and (11) the probability that he is behind his age 
group. Let x be a column vector of demographic variables describing the child's 
characteristics, where x includes variables for color, sex, rural-urban status, 
education of parents, and income of parents. It is assumed that g does not change 
with the group's age t. Partly, this assumption is justified by the genuine 
constancy of most of the x-variables listed; and partly, the assumption is 
forced on the model by the limitations of the data used. The model is as 



follows-- 



( 1 ) 



r t « a t + 4* + + u t > 0] 



(2) Ap„ = a. + b!x + c.p fc . + v. (sign b. same for all t; 

C L j t l,1 ~ L L* 1 L T 

c t > 0) 

Here 0^, a fc , b^., and c fc are parameters, and b fc being column vectors 

with the same dimension as x. Thus, the parameters may vary with the group's 
age t. The variables u and v are random error terms. Some assumptions about 

w 

parameter values are put in parentheses next to the equations. 

The model traces out the values of r and p for a child as his age t 

L> w 

progresses. Equation (1) states that, at age t, the child's enrollment probabil* 
ity is a linear function of the demographic characteristics x and the previous 
period's performance P t .^» plus an error. The performance variable P t _^ Is 
included in equation (1) with a positive coefficient, since students who have 
done well in school in the past seem more likely to continue their education. 
Equation (2) determine Ap^ * p -p , . Since p measures cumulative past 

performance, then Ap fc measures current performance. Equation (2) thus states 

■> 

that current performance is a linear function of the demographic variables g 

and lagged past performance p. . , plus an error term. The coefficient c of 

t" x c 

p , is assumed non-negative; because a negative coefficient would 
t-1 



. . . • -v 

5 

indicate that a better past performance results in a worse current performance, 
which seems unreasonable* A judgment about the appropriateness of the assump- 
tion that the sign of (every element of) is the same for all t oust wa.'.t 
till £ is precisely defined in the next section. However, the sense of the 
assumption is tc make statements like the following. If being non-white has 
a negative effect on school performance at age t, all else equal, then it will 
have a negative effect at all ages. Or, if having uneducated parents has a 
negative effect on school performance at age t, all else equal, then it will 
have a negative effect at all ages. 

The Census data used below in fitting the model is all measured at one 
point in time (1960). Hence lagged values of variables are not available, 
and the model cannot be fit as it stands. However, since equation (2) is simply 
a first order linear difference equation in p fc (complicated by an error term 
and by parameters which change with t), it can be solved for p^ as a function 
of £. The solution is— 

(3) P t - A t + + V t 

where— t t 

A ■ a + 2 [a. , n(c.+l)l 
* 6 i =2 11 j=i 3 

(4) B« b+ 2 lb. , II(c +1)] 

C 1=2 J-i 3 

V « v, + S [v . II (c +1)] 
z i -2 j=i J 

It is assumed in this solution that a child starts out at age t ** 0 even with 
his age group in terms of skipping ahead or flunking behind; that is, it is 
assumed that p Q * 0. The solution may be checked by substituting it back in 
equation (2). Substituting (3) in (1) gives— 

(5) r t - (a t + 7 t A t _ j) +<^ + 7 + («* t + 

Since there are no lagged variables in (3) and (5), these equations can be fit 
to the available Census data. Since the explanatory variables £ are exogenous, 
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ordinary least squares, or regression analysis, is an appropriate estimation 
technique; it will be used below. Since the coefficients in (3) and (5) are 
specific to various age levels (various values of t), the equations will be 
fit for each of a series of age groups. Hence the fits will give estimates of 

t* — t* a t + ?t^t-l* and + ?t^t-l * or various values of t. The model 
yields some predictions about these sets of estimates. 

It follows from (4) that** 



( 6 ) 



= *t-i + c A-i + it 

v t * Vi + C t v t-1 + v t 



Since, by assumption, sign b fc is the same for all t and c > 0 for all t, it 
follows from (4) that sign B fc is the same for all t. These facts, plus the 
first of equations (6), imply that (bJ > for all t. Finally, the 

second of equations (6) implies that the variance of V will be greater than 
the variance of for all t, assuming no substantial negative covariances 

among the v^, which seems reasonable. Thus, the following predictions may be 
made about the various age-group fits of equation (3)~ 

a. The coefficients (except the constant term) will have the same 
signs in each fit. (Sign B^ will be the same for all t.) 

b. The absolute values of the coefficients (except the constant term) 

will get larger for more advanced age groups. (|B t |> | for all t.) 

c. The error variance of the equation will get larger for more advanced 
age groups. [Var(V ) > var(V ).] 

fc X 



Very briefly and heuristically, these predictions may be rationalized as 
follows. Since p^ is a cumulative measure of school performance, then the 
associated coefficients and error variance in equation (3) may also be 
expected to cumulate; and this is essentially all the predictions say. 
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2. The Data 

The data come from one of the 1960 Census special reports on education [4] • 
Table 5 of this report, constructed from a five percent sample of the total 
U. S. population gives data on school enrollment for eech of seven age groups 
of children- -5 years, 6 years, 7-9 years, 10-13 years, 14-15 years, 16-17 years, 
and 18-19 years. Data on skip-flunk patterns are also presented for each of 
the age groups except the 5-year-olds and 6-year-olds, w.ic have not yet had 
time to establish a skip-flunk pattern. ’The age groups stop at age 19 because, 
after that age, too few children are still living with their parents; and thus 
the Census, which is taken on a family-by- family basis, does not contain matched 
data on children and parents. 

For each of the age groups, the children are cros^-classified by- - 

a. 2 racial categories 

b. 2 sex categories 

c. 3 rural-urban categories 

d. 3 education of parents categories 

e. 4 income of parents categories 

Then, for each age group there are 2X2X3X3X4 * 144 mutually exclusive cells 
containing data on enrollment and skip-flunk patterns. The number of children 
in a given cell will be referred to as the cell size. These 144 cells serve 
as the 144 observations in the regressions of r^ on x and p t on x for each age 
group (each value of t). Though r t is an unknown probability and p fc a difference 
of unknown probabilities, r and p fc can nonetheless be approximated with good 
.accuracy for a given one of the 144 cells of a given age group by the following 



empirical definitions-- 

students enrolled in school 



total students 

students enrolled and ahead of their age group 

D B ■ 
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total students enrolled 
students enrolled and behind their age group 



total students enrolled 
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The ratios on the right of these equalities are sample proportions; and it is 
well known that the error variance in measuring a true probability P by a 
sample proportion is P(l-P)/ra, where m is the sample size (here the cell size) 
on which the measure is based. Since the cell size m is almost without 
exception very large for all the 144 cells, or observations, of a given 
regression, then r and p will be gotten with good accuracy by these measures. 

w La 

The vector x of explanatory variables to be used in the regressions of 
r fc on x and p^ on x is defined by the following series of zero-one, or dummy 
variables. 






* 1 for non-whites, 0 for whites 






\ 9i 



where 



\ 



X 2 - 1 for females, 0 for males 

x« = 1 for persons living outside a central 
city but not on a farm, 0 otherwise 

x^ - 1 for persons living on a farm, 0 otherwise 

Xc a 1 if parent (father, ii living, otherwise mother) 
has 0 to 7 years of schooling, 0 otherwise 

Xg 58 1 if parent has 8 to 1] years of schooling, 

0 otherwise 

Xy = 1 if family income is under $3000, 0 otherwise 

Xg = 1 if family income is from $3000 to $4993, 

0 otherwise 



V Xg = 1 if family income is from $5000 to $6999, 

' 0 otherwise 

The vector & of dummy variables may thus take on 144 possible values corresponding 
to the 144 cells for a given group. * It follows from the zero-one nature of 



T.n making this count note that the x. come in groups. The five groups 
(x x ), (x 2 ), (x 3 ,x 4 ), (x 5 ,x 6 ), and (x 7> x 8> Xq) represent color, sex; rural-urban 

status, parental education, and parental income, respectively. No more than 
one x^ in a given group can take on the value one for a given observation. 
Taking account of this constraint, the five groups listed may take on 2, 2, 

3, 3, and 4 possible values, respectively. Thus, the complete vector x may 
take on 2X2X3X3X4 * 144 values. 












m 
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the x^ that the regressions using x as the vector of explanatory variables will 
be equivalent to five-way analyses of variance where the five classificationo 
are color, sex, rural-urban status, parental education, and parental income. 

The observation described by the condition that all the are zero (x - 0) 
is the one for white males living in a central city, whose parents have a high 
school or better education and a $7000 or better income. The expected value 
of the dependent variable for this observation is simply the constant term of 
the regression. It is convenient to think of this observation as a benchmark 
observation, and to think of the constant term as a benchmark value. Then the 
coefficient of a given x^ in a regression may be thought of as a deviation 
from the benchmark value caused by the characteristic associated with that 
variable. 




3. The Use of Weighted Regression 

Preliminary versions of the regressions to be presented below indicated 

a serious heteroskedasticity (unequal error variance) problem* It was found 

that the absolute values of the 144 residuals for a given regression tended 

to be negatively related to the corresponding cell sizes. That is, the residual 

error variance appeared to be negatively related to the cell size. Partly this 

problem might be traced to a decline of the measurement error in the dependent 

variables r fc and p^ as the cell size increases (as discussed in the last section). 

However, this appeared not to be a sufficient explanation; much of the problem 

appeared to be due to a genuine heteroskedasticity in the error terms of the 

underlying model. An easy way to take account of a negative relation between 

the error variance and the cell size is to assume the relation takes the exact 
2 2 2 

form = a /w^, where is the error variance of the i-th observation in 

2 

the regression, o is a constant, and w^ is the cell size of the observation. 

This relation leads, via standard least squares theory, to a straightforward 

weighted regression with the w^^ the weights (see for instance [2, pp. 231-36]). 

All regressions reported below will be such weighted regressions. This method 

of handling heteroskedasticity is a compromise with computational ease, since 

2 2 2 

a relation between a. and w. other than o. - a /w. might well be more faithful 

ii l i 

to the data, though more difficult computationally. However, it is comforting 
to recall that heteroskedasticity by itself does not cause bias in estimated 
regression coefficients. 

2 

The sum of squares minimized by a weighted regression is Y*) 

where Y^ and YJ are the actual and predicted values of the dependent variable 
for the i-th observation. This suggests the following R formula, where Y is 
the weighted sample mean of the Y.— 

r 2 = i - L 1 w i ( Y i- Y *) 2/ L i w i ( V 7) 2 

2 

All R ' s reported below are computed according to this formula. 
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Table 2 . Regressions with the Progress Rate the Dependent Variable 
Coefficients and (in Parentheses) Coefficient Standard Deviations of 
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4. Regression Fits of Equation (3) 

Table 2 presents a series of age group regressions of the progress 

p t on the explanatory variables x [that is, fits of equation (3)J. Every 
category of explanatory variable (color, sex, urban-rural status, education 
of parents, and income of parents) ir highly significant, and the coeff 
values bear out the predictions stated at the end of section I. That is, with 
few exceptions, the coefficients of the successive age group regressions do in 
fact have the same signs and do in fact get larger in absolute value. The 
prediction that there would be an increase with age in the variance 
term of equation (3) did not seem reasonably testable with the regressions 
reported here, due to the serious heteroskedasticity problem discussed in the 
last section. The R 2, s indicate a fairly good level of explanation. However, 
the R 2, s should be interpreted with caution. They measure the regression's 
ability to predict probabilities involved in a child's school behavior, not 
the behavior itself. The behavior itself for a single child may still be 
quite unpredictable (just as an accurate knowledge of the probability of 
flipping heads with a coin does not imply an ability to predict accurately 
the outcome of a single flip). On the other hand, in making predictions 



about a sizable group of children, an accurate notion of probabilities does 
imply an ability to predict accurately the percentages of the group that 

will behave in given ways. 



Inspection of the coefficients of individual variables suggests the 



following comments- - 



1. The parental Income and education variables have a 
positive and highly significant influence on school 
performance p,. (The coefficients themselves are 
negative because the benchmark group, with respect 
to which the dummy variables are defined, has 3 

in the highest income and education category.) The 
importance of this intergenerational effect may be 
explained a number of ways. Partly, the P^entai 
variables mav measure the quality of home educati , 



asm 
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which complements school education in deteraining 
school performance. Partly, the parenta : ^a.idbles 
may stand as proxies for attitudinal v&rxaoles which 
determine how hard children try in school. ?aTt ]l* 
the parental variables may stand as proxies for innate 
intelligence of parents, which is to some extent 
genetically bequeathed to children. And so on. The 
diversity of effects for which the parental variables 
may stand makes it hard to say what effect income . .. 
supplements to poor parents would have on their c 
ren. Taking the coefficients at simple face value 
suggests that a poverty policy which pushed all 
families in the less than $3000 income category in 
the $3000 to $5000 category would have a significant 
impact on the children's school performance. 



2 . 



The positive significance of the female dummy indicates 
that girls tend to do better in school than boys; and 
the coefficients are substantial in size, This is • 
surprisingly strong result in view of the mixed eviden 
from psychologists on sex differences in children 
abilities. (See for instance [1, pp. 9-10] and refe 
ences there.) 



3. 



The coefficients of the non-white dummy are negatlueand 
significant, as would be expected. This norite effect 
is measured with other variables held equal. It should 
be noted that other variables are typically not equal 
for non-white children, who are very likely to ave ow 
parental education and income also working agarnst them. 
Similar all-else-not-equal considerations apply to ju 8* 
ments about the orders of magnitude of all the coeffici 



The coefficients for the two rural-urban dummies have the 
same sign and very rough order of magnltudelnthevar 
regressions; this is perhaps because the aot-farm or central 
city residence category is made up largely of tat by 
population (country towns, small cities, and b"*®* 
farm) , which is similar in character to farm popu • 



The systematic pattern of cumulating coefficient sizes on the regressions of 
Table 2 makes it possible to combine all five regressions into a single simpli- 
fied regression based on all 5(144) - 720 observations. Suppose as a first 
simplification that the three rural-urban status categories are reduced to two- 
central city and rural (defined as not central city). This simplification, which 
is suggested by the similarity of coefficients of the two rural-urban dummies on 



Table 2, allows rural-urban status to be handled by a single dummy, call it Xj^ 
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which equals 0 for neutral city children, one otherwise. As a second simplifi- 
cation, suppose education and income of parents are handled by continuous 
variables B and Y instead of by two sets of dummy variables, where B is the 

number of years of schooling completed by a child's parent (father if living, 

3 

otherwise mother) and Y is the income of the child's family. With these 
simplifications, the relation between p t and the explanatory variables for 
a given age group might be specified as-- 

P t “ P o + < } 1 X 8W + P 2*FEM + <VW P 4 ln(E) + P 5 U(Y> 



Here x^ and Xpg^ are the dummy variables for non-white-ness and female-ness; 



and E and Y have been included in logarithmic form because preliminary results 
suggested it. It is known from Table 2 that the coefficients in such a 
relation get larger for successive age groups of children. Suppose this cumula- 
tive effect is approximated by making each of the 0^ a linear function of age, 

call it A, so that 0^ ■ 0 1Q + ^il A * Thea the relation becomes— 

p t * (P 00 + ^01 A * + ^10 + ^11 A)x NW + ^20 + ^2l A ^*F8M 

+ (P^o *** ^31 A ^ X RUR ^40 ^ P 4 i A ^ lu(E) + (Pjq + 0j|A)ln(Y) 

It may be seen that there are 12 to be estimated in this relation and that, 
if one multiplies through the parentheses, there will be just enough terms to 
estimate the 12 0^ by ordinary regression analysis. Such a regression was in 
fact fit, where the values (8, 11.5, 14.5, 16.5, 18.5) were assigned to A for 
each of the five age groups respectively; and where the observations in the 
regression were weighted by the corresponding cell sizes. 

Since the parental education data come in discrete categorizations, 
the following assumptions were used in constructing E. A parent with 0-7 
years of schooling was assigned an E- value of E ■ 4; a parent with 8-11 
years was assigned B • 9.5; and a parent with 12 or more years was assigned 
E • 13. Similarly, in constructing Y, the parental income categories 0-3000, 
3-5000, 5-7000, and 70004- were assigned Y « 1500, Y « 4000, Y * 6000, and 
Y ■ 10000 respectively. 







The regression result, based on 720 observations, is— 



p. » -[.119 + .0985 A] + 1.0605 - .0162 A] x^, 

C (.050) (.0071) (.0018) (.0017) 

+ [.00512 + .00758 A] x pT7M - [.0309 + .0019 A] x „ 

(.00712) (.00102) (.0080) (.0011) 

+ [.0205 + .0158 A] ln(E) + [.0168 + .00514 A] ln(Y) 

(.0090) (.0012) (.0060) (.00084) 

2 

The R* for this regression is .87, which compares fairly well with a pooled R 
of .94 for all five regressions on Table 2. 
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5. Regression Fits of Equation (5) 

Tame 3 presents regressions of the enrollment rate r^. on the explanatory 
variables £.* The R 2, s Indicate a fairly good level of explanation, and the 
coefficients are in general highly significant. It may be expected that the 
various institutional constraints faced by the various age groups of children 
will influence the regressions. The five-year-olds are too young to fall 
under the compulsory school attendance laws; and those that do attend school 
typically do so at their own, rather than the public’s expense. This Is also 
true to some extent for the six-year-olds. For the 7-9 and 10-13 year-olds, 
however, school is compulsory and free. For the 14-15 and 16-17 year-olds, 
schooling Is typically still free; but the compulsory attendance laws either 
no longer apply or are more difficult to enforce; and the opportunity costs 
from other occupations start to rise. Finally, the 18-19 year-olds are of 
beginning college age; and schooling is typically no longer free. 

Inspection of the coefficients suggests the following comments— 

1. For the age groups which face the same free-compulsory 
school situation (7-9, 10-13, 14-15, and 16-17), a 

rough pattern of cumulating coefficient values is observed 
in the successive regressions. This is the same pattern 
as observed for the p^-regressions of Table 2; and the 
theoretical rations le sugges ted at the end of section 1 
may apply here as well. 

2. By far the most important explanatory variables are the 
education of parents dummies, particularly for the impor- 
tant last three age groups, which cover the years when 
more than half che students drop out of school. 

3. The coefficients of the income dummies behave predictably 
for all except the last age group, where they become insig- 
nificant. This is a puzzling result, since income would 
seem to be particularly important for the age group which 
is first facing college expenses. 



^Closely related regressions may be found in Chapters 24 and 25 of [3]. 
There the dependent variable is an index of years of schooling completed; 
the observations are for individuals rather than groups; and the list of 
explanatory variables is much more detailed. 
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Table 3 . Regressions with the Enrollment Rate r fc the Dependent Variable 
Coefficients and (in parentheses) Coefficient Standard Deviations of 
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4. 



The non-white dummy is generally significant, taking 
a positive sign for the youngest and oldest age groups, 
and a negative sign otherwise. Since the youngest and 
oldest age groups bear much of their schooling cost 



personally, this sign pattern suggests that, other 
variables constant, non-whites may be more willing than 
whites to sacrifice other expenditures for school 
expenditures. Perhaps this is because a non-white is, 
relative to his social context, richer than a white 
with the same education and incane; and thus he is 
better able to afford extra educational expenditures 
for his children. (Another hypothesis is that non- 
whites in the 18-19 age group have a higher enrollment 
rate, other variables the same, because proportionally 
more of them have fallen behind scholastically and are 
still finishing high school. A test of this hypothesis 
can be gotten by adding the relative progress variable 
P*. as an additional explanatory variable in the regres- 
sions. If, after controlling on p , the sign pattern 
of the non-white dummy still remains, it suggests that 
the hypothesis is only a partial explanation at best. 
This turns out to be the case, as the regressions of 
Table 4 below will show. ) 






5. The female dummy is significant for only the 16-17 and 
18-19 age groups, with a positive and negative coeffi- 
cient for the two groups, respectively. The positive 
sign for the 16-17 age group (terminal high school years) 
is perhaps due to a girl's lesser impatience to quit 
school and get a job; while the negative sign for the 
18-19 age group (beginning college years) is perhaps 
due to society's relative reluctance to invest a 
college education in a prospective housewife. 



6. The two rural-urban status dummies have the same sign 
and the general order of magnitude in the various 
regressions. This is the same pattern as observed in 
Table 2, and the same suggested rationale applies here. 
The negative significance of these dummies for the 5 
and 6 age groups is perhaps due to the difficulty of 
getting pre- school age rural children to a kindergarten 
or other pre-school. A convincing rationale for the 
positive significance of these dummies for the older 
age groups seems difficult to find. 
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6. Supplementary Regressions 

The paper will be concluded with several additional regressions bearing 
on some minor points* In the original statement of the model in Section I, 
the enrollment probability r was assumed to depend partly on th? lagged 

progress variable p t-1 as follows— r fc « a fc + + ^t p t-l + u t* where ?t was 
hypothesized to be positive. Since data on the lagged variable was unavail - 
able, a solution form was found which expressed r fc as a function of & alone 
[equation (5)1* Unfortunately, in finding this solution form, the ability to 
test the hypothesis y^ > 0 was lost; and the regressions of Table 3 do not in 
fact provide such a test* However, there is another equation for r^ available 

4 

from the model, one which does not involve lagged variables and does not lose 
the ability to test the hypothesis y t > 0. Solving equation (2) for p fc-1 as 
a function of p fc and £, and substituting this result in equation (1) gives— 

r t 88 l« t - y t a t /(c t +i)3 + £ 

+ [7 t /( c t +l)l P t + t« t * 7 t v t /(c t 4-l)] 

which includes no lagged variables and is thus estimable with the available 
data. Table 4 presents regressions of this form. (Since p fc is determined in 
the model independently of r fc , then ordinary least squares, or regression 
analysis is still an appropriate estimation technique.) In terms of these 
regressions, the hypothesis that 7^ > 0 becomes the hypothesis that the 
coefficient of p is greater than zero (under the apparently safe assumption 
that the presumably positive parameter c^ is at least greater than •!•) In 
four of the five regressions, the coefficient of p fc is indeed significantly 
positive (by a standard t-test at any conventional significance level). This 
provides rough confirmation of the hypothesis 7 > 0. 

The most important explanatory variables in the various regressions presented 
were usually the education of parents variables. These variables refer to tha 
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father 1 8 education, if he is living, otherwise the mother's. It is informative 
to refit the regressions, using separate variables for the father' e education 
and the mother's education; the point is to see if one or the other parent 
exerts a greater influence on the child. The required data may be found on 
Table 4 of the same 1960 Census special report on education 14] . This table 
presents, for each age group of children living with both parents, enrollment 
and school performance data cross-classified by-- 

a. 2 color categories 

b. 2 8 ex categories 

c. 3 rural-urban categories 

d. 10 education of father and mother categories 

Thus, there are 2X2X3X10 * 120 mutually exclusive cells, which serve as the 
observations for the regressions presented on Table 5. Since no data on the 
Incomes of parents were available, these regressions are only roughly comparable 
to the previous regressions. Results are reported only for the two age groups 
16-17 and 18-19. In these regressions, the education of parents data were 
translated into two quantitative variables, defined as years of schooling 
completed by mother and by father; Table 6 shows how the translation was made. 

Table 5 suggests that the educations of a child's father and mother are 
of roughly equal importance in determining p^ and r^. Though the coefficient 
of the father's education variable is larger in all four regressions on Table 5, 
the differences are not substantial. They could easily be due to specification 
bias; the father's education variable may be picking up much of the effect of 
the excluded income variable. 




Table 5. Regressions Measuring Separate Effects of Father's and Mother's Educations 
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Table 6. Translation of Census Parental Education 
Categories into Quantitative Regression Variables 
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